Search CORE

95 research outputs found

Patient-specific data fusion for cancer stratification and personalised treatment

Author: Gligorijević V
Malod-Dognin N
Pržulj N
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 04/01/2016
Field of study

According to Cancer Research UK, cancer is a leading cause of death accounting for more than one in four of all deaths in 2011. The recent advances in experimental technologies in cancer research have resulted in the accumulation of large amounts of patient-specific datasets, which provide complementary information on the same cancer type. We introduce a versatile data fusion (integration) framework that can effectively integrate somatic mutation data, molecular interactions and drug chemical data to address three key challenges in cancer research: stratification of patients into groups having different clinical outcomes, prediction of driver genes whose mutations trigger the onset and development of cancers, and repurposing of drugs treating particular cancer patient groups. Our new framework is based on graph-regularised non-negative matrix tri-factorization, a machine learning technique for co-clustering heterogeneous datasets. We apply our framework on ovarian cancer data to simultaneously cluster patients, genes and drugs by utilising all datasets.We demonstrate superior performance of our method over the state-of-the-art method, Network-based Stratification, in identifying three patient subgroups that have significant differences in survival outcomes and that are in good agreement with other clinical data. Also, we identify potential new driver genes that we obtain by analysing the gene clusters enriched in known drivers of ovarian cancer progression. We validated the top scoring genes identified as new drivers through database search and biomedical literature curation. Finally, we identify potential candidate drugs for repurposing that could be used in treatment of the identified patient subgroups by targeting their mutated gene products. We validated a large percentage of our drug-target predictions by using other databases and through literature curation

Spiral - Imperial College Digital Repository

Fuse: Multiple Network Alignment via Data Fusion

Author: Gligorijević V
Malod-Dognin N
Pržulj N
Publication venue: 'Oxford University Press (OUP)'
Publication date: 09/10/2015
Field of study

Spiral - Imperial College Digital Repository

Integration of molecular network data reconstructs Gene Ontology.

Author: Gligorijević V
Janjić V
Pržulj N
Publication venue: 'Oxford University Press (OUP)'
Publication date: 22/08/2014
Field of study

Motivation: Recently, a shift was made from using Gene Ontology (GO) to evaluate molecular network data to using these data to construct and evaluate GO. Dutkowski et al. provide the first evidence that a large part of GO can be reconstructed solely from topologies of molecular networks. Motivated by this work, we develop a novel data integration framework that integrates multiple types of molecular network data to reconstruct and update GO. We ask how much of GO can be recovered by integrating various molecular interaction data. Results: We introduce a computational framework for integration of various biological networks using penalized non-negative matrix tri-factorization (PNMTF). It takes all network data in a matrix form and performs simultaneous clustering of genes and GO terms, inducing new relations between genes and GO terms (annotations) and between GO terms themselves. To improve the accuracy of our predicted relations, we extend the integration methodology to include additional topological information represented as the similarity in wiring around non-interacting genes. Surprisingly, by integrating topologies of bakers’ yeasts protein–protein interaction, genetic interaction (GI) and co-expression networks, our method reports as related 96% of GO terms that are directly related in GO. The inclusion of the wiring similarity of non-interacting genes contributes 6% to this large GO term association capture. Furthermore, we use our method to infer new relationships between GO terms solely from the topologies of these networks and validate 44% of our predictions in the literature. In addition, our integration method reproduces 48% of cellular component, 41% of molecular function and 41% of biological process GO terms, outperforming the previous method in the former two domains of GO. Finally, we predict new GO annotations of yeast genes and validate our predictions through GIs profiling. Availability and implementation: Supplementary Tables of new GO term associations and predicted gene annotations are available at http://bio-nets.doc.ic.ac.uk/GO-Reconstruction/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online

PubMed Central

Spiral - Imperial College Digital Repository

Integrative methods for analyzing big data in precision medicine

Author: Gligorijević V
Malod-Dognin N
Pržulj N
Publication venue
Publication date: 17/12/2015
Field of study

We provide an overview of recent developments in big data analyses in the context of precision medicine and health informatics. With the advance in technologies capturing molecular and medical data, we entered the area of “Big Data” in biology and medicine. These data offer many opportunities to advance precision medicine. We outline key challenges in precision medicine and present recent advances in data integration-based methods to uncover personalized information from big data produced by various omics studies. We survey recent integrative methods for disease subtyping, biomarkers discovery, and drug repurposing, and list the tools that are available to domain scientists. Given the ever-growing nature of these big data, we highlight key issues that big data integration methods will face

UCL Discovery

DDSL: Efficient Subgraph Listing on Distributed and Dynamic Graphs

Author: DB West
J Dean
JA Grochow
L Lai
M Qiao
N Pržulj
Publication venue
Publication date: 29/08/2020
Field of study

Subgraph listing is a fundamental problem in graph theory and has wide applications in areas like sociology, chemistry, and social networks. Modern graphs can usually be large-scale as well as highly dynamic, which challenges the efficiency of existing subgraph listing algorithms. Recent works have shown the benefits of partitioning and processing big graphs in a distributed system, however, there is only few work targets subgraph listing on dynamic graphs in a distributed environment. In this paper, we propose an efficient approach, called Distributed and Dynamic Subgraph Listing (DDSL), which can incrementally update the results instead of running from scratch. DDSL follows a general distributed join framework. In this framework, we use a Neighbor-Preserved storage for data graphs, which takes bounded extra space and supports dynamic updating. After that, we propose a comprehensive cost model to estimate the I/O cost of listing subgraphs. Then based on this cost model, we develop an algorithm to find the optimal join tree for a given pattern. To handle dynamic graphs, we propose an efficient left-deep join algorithm to incrementally update the join results. Extensive experiments are conducted on real-world datasets. The results show that DDSL outperforms existing methods in dealing with both static dynamic graphs in terms of the responding time

arXiv.org e-Print Archive

Crossref

Topology-Function Conservation in Protein-Protein Interaction Networks.

Author: Davis D
Malod-Dognin N
Pržulj N
Stojmirovic A
Yaveroğlu ÖN
Publication venue: 'Oxford University Press (OUP)'
Publication date: 15/05/2015
Field of study

Spiral - Imperial College Digital Repository

Topological network alignment uncovers biological function and phylogeny

Author: Cook S.
Flannick J.
Kuchaiev O.
Kuchaiev O.
Memišević V.
Nataša Pržulj
Oleksii Kuchaiev
Pržulj N.
Singh R.
Singh R.
Snijders T. A.
Tijana Milenković
Vesna Memišević
Wayne Hayes
Wentz-Hunter K.
Zhang Y.
Publication venue
Publication date: 07/10/2009
Field of study

Sequence comparison and alignment has had an enormous impact on our understanding of evolution, biology, and disease. Comparison and alignment of biological networks will likely have a similar impact. Existing network alignments use information external to the networks, such as sequence, because no good algorithm for purely topological alignment has yet been devised. In this paper, we present a novel algorithm based solely on network topology, that can be used to align any two networks. We apply it to biological networks to produce by far the most complete topological alignments of biological networks to date. We demonstrate that both species phylogeny and detailed biological function of individual proteins can be extracted from our alignments. Topology-based alignments have the potential to provide a completely new, independent source of phylogenetic information. Our alignment of the protein-protein interaction networks of two very different species--yeast and human--indicate that even distant species share a surprising amount of network topology with each other, suggesting broad similarities in internal cellular wiring across all life on Earth.Comment: Algorithm explained in more details. Additional analysis adde

arXiv.org e-Print Archive

Crossref

PubMed Central

UCL Discovery

Precision medicine ― A promising, yet challenging road lies ahead

Author: Malod-Dognin N
Petschnigg J
Pržulj N
Publication venue
Publication date: 01/02/2018
Field of study

Precision medicine proposes to individualize the practice of medicine based on patients’ genetic backgrounds, their biomarker characteristics and other omics datasets. After outlining the key challenges in precision medicine, namely patient stratification, biomarker discovery and drug repurposing, we survey recent developments in high-throughput technologies and big biological datasets that shape the future of precision medicine. Furthermore, we provide an overview of recent data-integrative approaches that have been successfully used in precision medicine for mining medical knowledge from big-biological data, and we highlight modeling and computing issues that such integrative approaches will face due to the ever-growing nature of big-biological data. Finally, we raise attention to the challenges in translational medicine when moving from research findings to approved medical practices

UCL Discovery

On the Feasibility of Malware Authorship Attribution

Author: A Rahimian
C Kruegel
DE Knuth
DI Holmes
EH Spafford
F Can
G Frantzeskou
I Krsul
J Ferrante
M Fowler
N Pržulj
N Rosenblum
S Alrabaee
S Alrabaee
S Alrabaee
S Burrows
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/01/2017
Field of study

There are many occasions in which the security community is interested to discover the authorship of malware binaries, either for digital forensics analysis of malware corpora or for thwarting live threats of malware invasion. Such a discovery of authorship might be possible due to stylistic features inherent to software codes written by human programmers. Existing studies of authorship attribution of general purpose software mainly focus on source code, which is typically based on the style of programs and environment. However, those features critically depend on the availability of the program source code, which is usually not the case when dealing with malware binaries. Such program binaries often do not retain many semantic or stylistic features due to the compilation process. Therefore, authorship attribution in the domain of malware binaries based on features and styles that will survive the compilation process is challenging. This paper provides the state of the art in this literature. Further, we analyze the features involved in those techniques. By using a case study, we identify features that can survive the compilation process. Finally, we analyze existing works on binary authorship attribution and study their applicability to real malware binaries.Comment: FPS 201

arXiv.org e-Print Archive

Crossref

Geometric De-noising of Protein-Protein Interaction Networks

Author: A Kumar
A Labarga
AC Gavin
AL Barabasi
AM Edwards
C Bishop
C Stark
C von Mering
D Higham
D Higham
Desmond J. Higham
DS Han
F Abraham
G Bader
G Hart
G Mishra
GH Golub
H Chua
J Chen
J Rual
J Wang
J Yu
L Giot
M Kanehisa
M Penrose
Marija Rašajski
MS Lee
N Krogan
N Pržulj
N Pržulj
N Pržulj
N Pržulj
Nataša Pržulj
O Kuchaiev
Oleksii Kuchaiev
P Erdös
P Uetz
R Colak
R Jansen
R Singh
S Collins
S Li
S Pitre
S Suthram
T Cox
T Ito
T Milenkovic
Teresa Maria Przytycka
TGO Consortium
U Stelzl
XW Chen
Y Ho
Z Ma
Publication venue: Public Library of Science
Publication date: 01/08/2009
Field of study

Understanding complex networks of protein-protein interactions (PPIs) is one of the foremost challenges of the post-genomic era. Due to the recent advances in experimental bio-technology, including yeast-2-hybrid (Y2H), tandem affinity purification (TAP) and other high-throughput methods for protein-protein interaction (PPI) detection, huge amounts of PPI network data are becoming available. Of major concern, however, are the levels of noise and incompleteness. For example, for Y2H screens, it is thought that the false positive rate could be as high as 64%, and the false negative rate may range from 43% to 71%. TAP experiments are believed to have comparable levels of noise

Crossref

University of Strathclyde Institutional Repository

Directory of Open Access Journals

PubMed Central

UCL Discovery

eScholarship - University of California